Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases
ثبت نشده
چکیده
Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result ?” or “Why does this output tuple have such high probability ?”. Second, the problem of determining the sensitive input tuples for the given query, e.g., users are interested to know the input tuples that can substantially alter the output, when their probabilities are modified (since they may be unsure about the input probability values). Existing systems provide the lineage/provenance of each of the output tuples in addition to the output probabilities, which is a boolean formula indicating the dependence of the output tuple on the input tuples. However, it does not immediately provide a quantitative relationship and it is not informative when we have multiple output tuples. In this paper, we propose a unified framework that can handle both the issues mentioned above and facilitate robust query processing. We formally define the notions of influence and explanations and provide algorithms to determine the top-` influential set of variables and the top-` set of explanations for a variety of queries, including conjunctive queries, probabilistic threshold queries, top-k queries and aggregation queries. Further, our framework naturally enables highly efficient, incremental evaluation when the input probabilities are modified, i.e., if the user decides to change the probability of an input tuple (e.g., if the uncertainty is resolved). Our preliminary experimental results demonstrate the benefits of our framework for performing robust query processing over probabilistic databases.
منابع مشابه
Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases
Title of Dissertation: SCALABLE STATISTICAL MODELING AND QUERY PROCESSING OVER LARGE SCALE UNCERTAIN DATABASES Bhargav Kanagal Shamanna Doctor of Philosophy, 2011 Dissertation directed by: Dr. Amol Deshpande Dept. of Computer Science The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructu...
متن کاملMost Probable Explanations for Probabilistic Database Queries (Extended Abstract)
Probabilistic databases (PDBs) have been widely studied in the literature, as they form the foundations of large-scale probabilistic knowledge bases like NELL and Google’s Knowledge Vault. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information...
متن کاملMost Probable Explanations for Probabilistic Database
Probabilistic databases (PDBs) have been widely studied in the literature, as they form the foundations of large-scale probabilistic knowledge bases like NELL and Google’s Knowledge Vault. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information...
متن کاملMost Probable Explanations for Probabilistic Database Queries
Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this pote...
متن کاملRead-Once Functions and Query Evaluation in Probabilistic Databases
Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g., safe...
متن کامل